Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add read-only Config endpoint #9497

Merged
merged 14 commits into from
Jun 26, 2020
Merged

Conversation

22quinn
Copy link
Contributor

@22quinn 22quinn commented Jun 24, 2020

Closes #8136


Make sure to mark the boxes below before creating PR: [x]

  • Description above provides context of the change
  • Unit tests coverage for changes (not needed for documentation changes)
  • Target Github ISSUE in description if exists
  • Commits follow "How to write a good git commit message"
  • Relevant documentation is updated including usage instructions.
  • I will engage committers as explained in Contribution Workflow Example.

In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.
Read the Pull Request Guidelines for more information.

@boring-cyborg boring-cyborg bot added the area:API Airflow's REST/HTTP API label Jun 24, 2020
@22quinn 22quinn marked this pull request as draft June 24, 2020 11:59
@22quinn
Copy link
Contributor Author

22quinn commented Jun 24, 2020

I see there are pagenization parameters:

parameters:
- $ref: '#/components/parameters/PageLimit'
- $ref: '#/components/parameters/PageOffset'

But to me it seems werid to have pagenization for config. I prefer to remove it. WDYT? @mik-laj

@mik-laj
Copy link
Member

mik-laj commented Jun 24, 2020

@zikun I agree. That's weird. Let's delete these parameters.

@@ -1760,6 +1757,9 @@ components:
value:
type: string
readOnly: true
source:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think they can't do anything about this information. They will not change their behavior because the information comes from environment variable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is inspired by the table in the web configuration page, which has four columns - section, key, value and source. Isn't source information useful for admin users to change and debug the configuration? Especially when it comes from multiple sources like airflow.cfg, env var, cmd.

Copy link
Member

@mik-laj mik-laj Jun 25, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if we will be able to maintain the backward compatibility of the API for this field. in my opinion, the value of this field in the API is low because it refers to values that the API client cannot influence in any way. This may allow debugging problems, but the main goal of the API is to facilitate the management, but not to facilitate troubleshooting.

A similar situation is with the Job table, which is not present in API, and access to it allows us to solve troubleshooting issues, but this table is not relevant for third-party systems and has not been included in the API specification. Each field/endpoint in the API is opt-in, not opt-out, to facilitate backward compatibility.

If you want to make field decisions, think about whether this field will be relevant when you have 100 Airflow instances., In this case, you need a different view of the data stored in the system. You may worry about what the value of the configuration option looks like, e.g. to compare instances, but the source of the content is technical detail.

We can add additional endpoints that allow access to more detailed data in the future, but these endpoints will have to be specially marked to ensure level of stability.

Copy link
Contributor Author

@22quinn 22quinn Jun 25, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see where you are coming from. I think I am not clear on the main use case of this endpoint. Do you mind giving a specific example on what this endpoint might be used for? Like what do people do after they query GET /config from 100 Airflow instances?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Airflow has options that have a big impact on instance performance and resource usage.

parallelism = 32
dag_concurrency = 16
max_active_runs_per_dag = 16
dag_file_processor_timeout = 50
scheduler_heartbeat_sec = 5
job_heartbeat_sec = 5
processor_poll_interval = 1
min_file_process_interval = 0
dag_dir_list_interval = 300
etc.

Users may want to read these values and then combine them with data from other applications (e.g. Stackdriver, Zabbix, Prometheus) e..g. average CPU usage, average memory usage, etc. This will allow us to make recommendations on the changes that should be made to improve the health of the instance

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I removed it

@22quinn
Copy link
Contributor Author

22quinn commented Jun 25, 2020

Hi @mik-laj I need some help for the unit test. Because the original airflow.configuration.conf variable contains many sections and options, the expected API response is too long to put in the test. It is also not maintainable as the default config can change. So I want to use a small conf to mock it. I tried both pytest monkeypatch and mock.patch, but the API still returns the original config. Any idea?

@mik-laj
Copy link
Member

mik-laj commented Jun 25, 2020

@zikun I'm starting to look at it

@mik-laj
Copy link
Member

mik-laj commented Jun 25, 2020

I think we need to give up one response format.
spec-first/connexion#860

@mik-laj
Copy link
Member

mik-laj commented Jun 25, 2020

@zikun Here is an example of testing using mock
mik-laj@48e4127

@mik-laj
Copy link
Member

mik-laj commented Jun 25, 2020

@i think we need to give up one response format.
spec-first/connexion#860

This is weird because Dag Source and Log uses different types of responses and it probably works there.

@22quinn
Copy link
Contributor Author

22quinn commented Jun 25, 2020

@zikun Here is an example of testing using mock
mik-laj@48e4127

Thanks a lot for the example. It did not work because I was mocking conf variable. Mocking as_dict function works!

@i think we need to give up one response format.
zalando/connexion#860

This is weird because Dag Source and Log uses different types of responses and it probably works there.

I just tried testing with both json and text/plain response types. The json test failed. I'm looking into the dag source and log PRs now to find differences that lead to the failure.

@22quinn 22quinn force-pushed the api-config-endpoint branch from c18722d to 344943b Compare June 25, 2020 10:38
@22quinn 22quinn changed the title [WIP] Add read-only Config endpoint Add read-only Config endpoint Jun 25, 2020
@22quinn 22quinn marked this pull request as ready for review June 25, 2020 11:00
@22quinn
Copy link
Contributor Author

22quinn commented Jun 25, 2020

I fixed the json test. Now it works for both text/plain and json.

Now there's only one pylint test failing

tests/api_connexion/endpoints/test_config_endpoint.py:45:8: W0201: Attribute 'client' defined outside init (attribute-defined-outside-init)

I converted unittest to pytest as I remember there was a discussion to move away from unittest to pytest.
Can I make client a class attribute by moving it to setup_class()? Was there any reason to put it in setUp() rather than setUpClass() for unittest?

@22quinn
Copy link
Contributor Author

22quinn commented Jun 25, 2020

All checks passed @mik-laj

@mik-laj
Copy link
Member

mik-laj commented Jun 25, 2020

I finished work today. Please ping me tomorrow.

Comment on lines 45 to 50
config_text = '\n'.join(
f'[{config_section.name}]\n' +
''.join(f'{config_option.key} = {config_option.value} # source: {config_option.source}\n'
for config_option in config_section.options)
for config_section in config.sections
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would you say to use some helper methods like:

def _make_single_record(config_option):    
    return f'{config_option.key} = {config_option.value}  # source: {config_option.source}\n'

def _make_single_section(config_section):    
    return f'[{config_section.name}]\n{_make_single_record(o) for o in config_section.options}'

def _config_to_plain_text(config):
    return '\n'.join(_make_single_section(s) for s in config.sections)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not see the benefits of such gradation.

I think we can split it in a different way.

text_serializer = {
    ''text/plain'': func1 ,
    ''text/plain'': func2 ,
}
conf_dict = conf.as_dict()
config = conf_dict_to_config(conf_dict)
return_type = request.accept_mimetypes.best_match(response_types)
if return_type not in serializer:
    return Response(status=406)
config_text = text_serializer[return_type]
return Response(config_text, headers={'Content-Type': return_type})  

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I think both of your suggestions are good. We can combine them.

It's good to break into smaller functions especially as they handle different scopes, just like having nested classes for ConfigSchema. One benefit I can think of is in case we want to offer smaller endpoints like /config/{section}/{option}, we can easily make use of those small functions.

Co-authored-by: Tomek Urbaszek <turbaszek@apache.org>
Co-authored-by: Kamil Breguła <mik-laj@users.noreply.github.com>
@22quinn 22quinn force-pushed the api-config-endpoint branch from e2e0f35 to 045fbbb Compare June 26, 2020 03:56
@22quinn 22quinn force-pushed the api-config-endpoint branch from 045fbbb to b9dcc7d Compare June 26, 2020 04:00
@22quinn 22quinn requested a review from mik-laj June 26, 2020 11:42
@mik-laj mik-laj merged commit f729cfd into apache:master Jun 26, 2020
@22quinn 22quinn deleted the api-config-endpoint branch June 27, 2020 02:39
kaxil pushed a commit to kaxil/airflow that referenced this pull request Jun 27, 2020
Co-authored-by: Kamil Breguła <kamil.bregula@polidea.com>
Co-authored-by: Tomek Urbaszek <turbaszek@apache.org>
Co-authored-by: Kamil Breguła <mik-laj@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:API Airflow's REST/HTTP API
Projects
None yet
Development

Successfully merging this pull request may close these issues.

API Endpoint - Config
3 participants